Data set

data from https://data.rijksmuseum.nl/object-metadata/download/
This Comma Separated Values file (202020-rma-csv-collection.zip) provides a simple inventory of objects in the Rijksmuseum collection. It includes the object number and persistent identifier, as well as a single title, type, creator, date and image URL for each object.

Visualization goals

## [1] 5

original data has column names:

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.6     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
data <- read_csv("rma-csv-collection.csv")
## Rows: 667894 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): objectInventoryNumber, objectPersistentIdentifier, objectTitle[1], ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data
# A tibble:667,894 × 7

column 3, 4, 5, 6 should be kept meanwhile on inspecting objectCreationDate[1] there are many NA values and strings. Due to the original file being too large (over 100MB),

data <- data %>%
  transmute(
    title = data[[3]],
    type = data[[4]],
    creator = data[[5]],
    year = data[[6]],
    index = row_number()
  ) %>% 
  # filter out irregular years such as 0000-00
  filter(nchar(year) < 6, index %% 50 == 0)  

write_csv(data, "rma-downsize.csv")
data

current size

## total 139M
## -rwxrwxrwx 1 alexm alexm 1.3M Nov 17 17:35 project-description.nb.html
## -rwxrwxrwx 1 alexm alexm 2.5K Nov 18 23:36 project-description.Rmd
## -rwxrwxrwx 1 alexm alexm 137M Nov 18 22:08 rma-csv-collection.csv
## -rwxrwxrwx 1 alexm alexm 955K Nov 18 23:36 rma-downsize.csv
## -rwxrwxrwx 1 alexm alexm  13K Nov 13 17:54 sample.png
data <- read_csv("rma-downsize.csv")
## Rows: 11997 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): title, type, creator
## dbl (2): year, index
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data

to explore the change of the top 7 type according to year.

## [1]    1 5830

reorder the types according to count, choose the top 7 (or if the top 7 are too close, choose the 1st, 21st, 41st, .. in the list)

str(data$year)  # chr [1:667894]
##  num [1:11997] 1880 1650 -1600 1600 1368 ...

year contains strings and NA, to clean the data - filter out rows with year that is not a number - try keep the x axis continuous, if it doesn’t work, then fct_lump() it into ranges

the visualization will be something like nothing here and hopefully fancier and with annotations staying in the right places